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© An apparatus and method for correctly pronouncing proper names from text using a computer provides a 
dictionary which performs an initial search for the name. If the name is not in the dictionary, it is sent to a filter 
which either positively identifies a single language group or eliminates one or more language groups as the 
language group of origin for that word. When the filter cannot positively identify the language group of origin for 
the name, a list of possible language groups is sent to a grapheme analyzer. Using grapheme analysis, the most 
probable language group of origin for the name is determined and sent to a language-sensitive letter-to-sound 
section. In this section, the name is compared with language-sensitive rules to provide accurate phonemics and 
stress information for the name. The phonemics (including stress information) are sent to a voice realization unit 
for audio output of the name. 
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NAME PRONUNCIATION BY SYNTHESIZER 



The present invention relates to text-to-speech conversion by a computer, and specifically to correctly 
pronouncing proper names from text. 

Name pronunciation may be used in the area of field service within the telephone and computer 
industries. Jt is also found within larger corporations having reverse directory assistance (number to name) 
5 as well as in text-messaging systems where the last name field is a common entity. 

There are many device commercially available which synthesize American English speech by com- 
puter. One of the functions sought for speech synthesis which presents special problems is the pronunci- 
ation of an unlimited number of ethnically diverse surnames. Due to the extremely large number of different 
surnames in an ethnically diverse country such as the United States, the pronouncing of a surname cannot 
70 be practically implemented at present by use of other voice output technologies such as audiotape or 
digitized stored voice. 

There is typically an inverse relation between the pronunciation accuracy of a speech synthesizer in its 
source language and the pronunciation accuracy of the same synthesizer in a second language. The United 
States is an ethnically heterogeneous and diverse country with names deriving from languages which range 

75 from the common Indo-European ones such as French. Italian, Polish, Spanish, German, Irish, etc. to more* ' 
exotic ones such as Japanese, Armenian, Chinese, Arabic, and Vietnamese. The pronunciation of surnames 
from the various ethnic groups does not conform to the rules of standard American English. For example, 
most Germanic names are stressed on the first syllable, whereas Japanese and Spanish names tend to 
have penultimate stress, and French names, final stress. Similarly, the orthographic sequence CH is 

20 pronounced [c] in English names (e.g. CHILDERS), [s] in French names such as CHARPENTIER, and [k] in 
Italian names such as BRONCHETTI. Human speakers often provide correct pronunciation by "knowing" 
the language of origin of the name. The problem faced by a voice synthesizer is speaking these names 
using the correct pronunciation, but since computers do not "know" the ethnic origin of the name, that 
pronunciation is often incorrect. 

25 A system has been proposed in the prior art in which a name is first matched against a number of 
entries in a dictionary which contains the most common names from a number of different language groups, 
Each dictionary entry contains an orthographic form and a phonetic equivalent. If a match occurs, the 
phonetic equivalent is sent to a synthesizer which turns it into an audible pronunciation for that name. 

When the name is not found in the dictionary, the proposed system used a statistical trigram modeL 

30 This trigram analysis involved estimating a probability that each three letter sequence (or trigram) in a name 
is associated with an etymology. When the program saw a new word, a statistical formula was applied in 
order to estimate for each etymology a probability based on each of the three letter sequences (trigrams) in 
the word. 

The problem with this approach is the accuracy of the trigram analysis. This is because the trigram 
35 analysis computes only a probability, and with all language groups being considered as a possible 
candidate for the language group of origin of a word, the accuracy of the selection of the language group of 
origin of the word is not as high as when there are fewer possible candidates. 

According to one aspect of the present invention there is provided a method for positively identifying or 
eliminating a language group as a language group of origin for a given word, comprising: 
40 comparing substrings of graphemes of an input word to a stored set of filter rules until either a match of one 
of the substrings to one of the filter rules positively identifies a language group, or any language group is 
eliminated when a match of one of the substrings to one of the filter rules indicates a language group is 
eliminated from consideration as a language group of origin for the input word; and 

producing a list of possible non-eliminated language groups of origin when no language group is positively 
45 identified as the language group of origin or indicating the language group of origin when the language 
group of origin is positively identified. 

According to another aspect of the present invention there is provided a method for generating correct 
phonemics for a given input word according to a language group of origins of the input word, the method 
comprising: 

50 filtering the input word in a filter to identify a language group of origin for the input word or to eliminate at 
least one language group of origin for the input word; 

sending the input word and a language tag indicating a language group of origin for the input word from the 
filter to a letter-to-sound module containing letter-to-sound rules when the filter positively identifies a 
language group of origin for the input word; 

sending from the filter the input word and any non-eliminated language groups to a grapheme analyser 
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when a language group of origin for the input word is not positively identified by the filter; 

producing a most probable language group of origin for the input word by analysing graphemes in the input. 

word; 

sending the input word and the most probable language group of origin to a subset of the letter-to-sound 
5 module corresponding to the most probable language group; 

producing in the subset of letter-to-sound module segmental phonemics for the input word; 

sending the segmental phonemics and the language tag from the letter-to-sound module to a stress 

assignment section; 

producing stress assignment information for the input word in the stress assignment section; and 
to sending the segmental phonemics and the stress assignment information to a voice realisation unit. 

According to this aspect there is also provided apparatus for positively identifying or eliminating a 
language group as a language group or origin for a given word, comprising: 

a filter rule store which stores a set of filter rules, a first subset of the filter rules positively identifying a 
language group, and a second subset of the filter rules eliminating a language group; 

15 a comparator which compares substrings of graphemes of an input word to the first and second subsets of 
filter rules until a match of one of the substrings to one of the first subset of filter rules positively identifies a 
language group or eliminates any language group when a match of one of the substrings to one of the 
second subset of filter rules indicates a language group is eliminated from consideration as a language 
group of origin for the input word; and 

20 an output which produces a list of possible language groups of origin when no language group is positively 
identified as the language group of origin, and which produces an indication of the language group of origin 
when the language group of origin is positively identified. 

The present invention solves the above problem by improving the accuracy of the trigram analysis. This 
is done by providing a filter which either positively identifies a language group as a language group of 

25 origin, or eliminates a language group as a language group of origin for a given input word. The filtering 
method according to the present invention comprises identifying or eliminating a language group as a 
language group of origin for an input word according to a stored set of filter rules. The step of identifying or 
eliminating a language group includes performing an exhaustive search of the rule set using a right-to-left 
scan. Language groups are eliminated when a match of one of these substrings to one of the filter rules 

30 indicates that a language group should be eliminated from consideration as the language group of origin for 
the input word. This is done until a match of one of the substrings to one of the rules positively identifies a 
language group. When no language group is positively identified as a language group of origin after ail of 
the substrings for a given input word are compared, a list of possible language groups of origin is produced. 
This filter method also produces a positively identified language group of original when there is a positive 

35 identification. 

The advantages of using a filter before the trigram analysis includes avoiding unnecessary trigram 
analysis when filter rules can positively identify a language group as a language group of origin. When no 
language group can be positively identified, the filtering method also reduces the chances of an incorrect 
guess being made in the trigram analysis by reducing the number of possible language groups in 

40 consideration as the language group of origin. Through the elimination of some language groups, the 
identification of a language group of origin is more accurate, as discussed above. 

The invention also includes a method for generating correct phonemics for a given input word 
according to the language group of origin of the input word. This method comprises searching a dictionary 
for an entry corresponding to an input word, each entry containing a word and phonemics for that word. 

45 This entry is then sent to a voice realization unit for pronunciation when the dictionary search reveals an 
entry corresponding to the input word. The input word is sent to a filter when the input word does not have 
a corresponding entry in the dictionary. 

The next step in the method involves filtering to identify a language group of origin for the input word or 
to eliminate at least one language group of origin for the input word. When the filter positively identifies a 

so language group of origin for the input word, the input word and a language tag indicating a language group 
of origin for the input word is sent from the filter to a letter-to-sound module. When a language group of 
origin is not positively identified by the filter, the input word and any language groups not eliminated are 
sent from the filter to a trigram analyzer. 

A most probably language group of origin for the input word is produced by analyzing trigrams 

55 occurring in the input word. This most probably language group of origin produced by the trigram analysis 
is sent along with the input word to a subset of letter-to-sound rules that correspond to the most probable 
language group. Phonemics are generated for the input word according to the corresponding subset of 
letter-to-sound rules. 
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.0866 
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In the array above, L is a language group and n is the number of language groups not eliminated by the 
filter 12. The trigram #VI has a probability of .0679 of being from language group Li, .4659 of being from the 

75 language group Lj and .2093 of being from language group Ln. Lj is averaged as the highest probability and 
thus the language group is identified. 

The probability of each of the trigrams of the grapheme string (input name) is similarly input to the 
trigram analyzer 14. The probability of each trigram in an input name is averaged for each language group. 
This represents the probability of the input name originating from a particular language group. The 

20 probability that the grapheme string #VITALE# belongs to a particular language group is produced as a 
vector of probabilities from the total probability line. From this vector of probabilities, other items such as 
standard deviation and thresholding can also be calculated. This ensures that a single trigram cannot overly 
contribute to or distort the total probability. 

Although the illustrated embodiment analyzes trigrams, the analyzer 14 can be configured to analyze 

25 different length grapheme strings, such as two-grapheme or four-grapheme strings. 

In the example above, the trigram analyzer 14 shows that language group Lj is the most probable 
language group of origin for the given input name, since it has the highest probability. It is this most 
probable language group that becomes the L TAG for the input name. The L TAG and the input name are 
then sent to the letter-to-sound section 20 to produce the phonemics for the input. 

30 The filter rules are constructed in such a way that ambiguity of identification is not possible. That is, a 
language may not be both eliminated and positively identified since a dominance relationship applies such 
that a positive identification is dominant over an elimination rule in the unlikely event of a conflict. 

Similarly, a language group may not be positively identified for more than one language because the 
filter rules constitute an ordered set such that the first positive identification applies. 

35 The system may default to a certain language group if one of two thresholding criteria is met: (a) 
absolute thresholding occurs when the highest probability determined by the trigram analyzer 14 is below a 
predetermined threshold Ti. This would mean that the trigram analyzer 14 could not determine from among 
the language groups a single language group with a reasonable degree of confidence; (b) relative 
thresholding occurs when the difference in probabilities between the language group identified as having 

40 the highest probability and the language group identified as having the second highest probability falls 
below a threshold Tj as determined by the trigram analyzer 14. 

The default to a specified language group is a settable parameter. In an English-speaking environment, 
for example, a default to an English pronunciation is generally the safest course since a human, given a low 
confidence level, would most likely resort to a generic English pronunciation of the input name. The value of 

45 the default as a settable parameter is that the default would be changed in certain situations, for example, 
where the telephone exchange indicates that a telephone number is located in a relatively homogeneous 
ethnic neighborhood. 

As mentioned earlier, the name and language tag (LTAG) sent by either the filter 12 or the trigram 
analyzer 14 is received by the letter-to-sound rule section 20. The letter-to-sound rule section 20 is broken 
so up conceptually into separate blocks for each language group, in other words, language group {Lj) will have 
its own set of letter-to-sound rules, as does language group (Lj), language group (L K ) etc. to language group 

<u>. 

Assuming that the input name has been identified sufficiently so as not to generate a default 
pronunciation, the input name is sent to the appropriate language group letter-to-sound block 22^ according 
55 to the language tag associated with the input name. 

In the letter-to-sound rule section 20, the rules for the individual language group blocks 22 are subsets 
of a larger and more complex set of !etter-to-sound rules for other language groups including English. A 
letter-to-sound block 22i for a specific language group L ( that has been identified as the language group of 
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The final table then has four dimensions; one for each grapheme of the trigram, and one for the 
language group. 

The trigram probabilities as computed by the block 66 are sent to the language identification and 
phonetic realization block 60, and particularly to the trigram analyzer 14 which produces the vector of 
s probabilities that the grapheme string belongs to a particular language group. 

Using the above-described system, names can be more accurately pronounced. Further developments 
such as using the first name in conjunction with the surname in order to pronounce the surname more 
accurately are contemplated. This would involve expanding the existing knowledge base and rule sets. 

70 

Claims 

1. A method for positively identifying or eliminating a language group (L i ...L n ) as a language group of 
origin for a given word, comprising: 

75 comparing substrings of graphemes of an input word to a stored set of filter rules until either a match of one 
of the substrings to one of the filter rules positively identifies a language group, or any language group is 
eliminated when a match of one of the substrings to one of the filter rules indicates a language group is 
eliminated from consideration as a language group of origin for the input word; and 

producing a list of possible non-eliminated language groups of origin when no language group is positively 
20 identified as the language group of origin or indicating the language group of origin when the language 
group of origin is positively identified. 

2. A method as claimed in claim 1, wherein said comparing step includes the step of searching the filter 
rules from top to bottom and right to left. 

3. A method as claimed in claim 1, wherein the comparing step includes the step of searching the filter 
25 rules by language group and by grapheme within each language group. 

4. A method for generating correct phonemics for a given input word according to a language group of 
origins of the input word, the method comprising: 

filtering the input word in a filter (12) to identify a language group of origin for the input word or to eliminate 
at least one language group of origin for the input word; 
30 sending the input word and a language tag indicating a language group of origin for the input word from the 
filter to a letter-to-sound module (22) containing ietter-to-sound rules when the filter positively identifies a 
language group of origin for the input word; 

sending from the filter the input word and any non-eliminated language groups to a grapheme analyser (14) 
when a language group of origin for the input word is not positively identified by the filter; 
35 producing a most probable language group of origin for the input word by analysing graphemes in the input 
word; 

sending the input word and the most probable language group of origin to a subset of the letter-to-sound 
module corresponding to the most probable language group; 

producing in the subset of letter-to-sound module segmental phonemics for the input word; 
40 sending the segmental phonemics and the language tag from the letter-to-sound module to a stress 
assignment section (24); 

producing stress assignment information for the input word in the stress assignment section; and 
sending the segmental phonemics and the stress assignment information to a voice realisation unit (50). 

5. A method as claimed in claim 4, wherein the graphemes are trigrams. 

45 6. A method as claimed in claim 4 or 5, wherein the step of producing a most probable language group 
of origin includes the step of computing probabilities of graphemes for an input word being from a particular 
language group using Bayes* Rule. 

7. A method as claimed in claim 4, 5 or 6, further comprising the step of defaulting to a general 
pronunciation when the step of producing a most probable language group of origin produces a most 

so probable language group of origin having a probability below a predetermined threshold level. 

8. A method as claimed in claim 4, 5, 6 or 7, further comprising the step of defaulting to a general 
pronunciation when the step of producing a most probable language group of origin produces a most 
probable language group of origin having a probability that is not greater by a predetermined amount than a 
probability of a next most probable language group of origin. 

55 9. A method as claimed in any of claims 4 to 8 including first searching a dictionary (10) for an entry 
corresponding to the input word, each entry containing a word and phonemics for that word; and 
sending an entry to the voice realisation unit for pronunciation when the dictionary searching reveals that 
entry corresponding to the input words. 
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10 Apparatus for positively identifying or eliminating a language group <u..L„) as a language group or 
origin for a given word, comprising: 

a filter rule store (68) which stores a set of filter rules, a first subset of the filter rules positively identifying a 
language group, and a second subset of the filter rules eliminating a language group; 
a comparator (12) which compares substrings of graphemes of an input word to the first and second 
subsets of filter rules until a match of one of the substrings to one of the first subset of filter rules positively 
identifies a language group or eliminates any language group when a match of one of the substrings to one 
of the second subset of filter rules indicates a language group is eliminated from consideration as a 
language group of origin for the input word; and 

an output which produces a list of possible language groups of origin when no language group is positively 
identified as the language group of origin, and which produces an indication of the language group of origin 
when the language group of origin is positively identified. 

11. Apparatus as claimed in claim 10 including an analyser (14) for calculating the most probable 
language group of origin for the graphemes in the given word for each language not eliminated by the 
second subset of the filter rules received from the output. 

12. Apparatus as claimed in claim 11 in which the analyser analyses graphemes in the qiven word 
arranged into trigrams 
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ABSTRACT OF THE DISCLOSURE 



An apparatus and method for correctly pronouncing proper names 
from text using a computer provides a dictionary which performs 
an initial search for the name. If the name is not in the 
dictionary, it is sent to a filter which either positively 
identifies a single language group or eliminates one or more 
language groups as the language group of origin for that word. 
When the filter cannot positively identify the language group 
of origin for the name, a list of possible language groups is 
sent -to a grapheme analyzer. Using grapheme analysis, the most 
probable language group of origin for the name is determined 
and sent to a language-sensitive letter-to-sound section. In 
this section, the name is compared with language-sensitive 
rules to provide accurate phonemics and stress information for 
the name. The phonemics (including stress information) are 
sent to a voice realization unit for audio output of the name. 
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